Performance Improvement of Web Page Genre Classification
نویسندگان
چکیده
منابع مشابه
Performance Improvement of Web Page Genre Classification
The dynamic nature of web and with the increase of the number of web pages, it is very difficult to search required web pages easily and quickly out of thousands of web pages retrieved by a search engine. The solution to this problem is to classify the web pages according to their genre. Automatic genre identification of web pages has become an important area in web page classification, because...
متن کاملWeb Page Genre Classification: Impact of n-Gram Lengths
Web pages are discriminated based on their topic and genre. Web page genres are capable to improve the modern search engines to focus on the user's information need. In this paper, web pages are represented using character n-grams. Character n-gram representation is language independent and allows automatic extraction of features from a web page. Character n-gram representation of a web pa...
متن کاملGenre Classification of Web Pages
Genre classification means to discriminate between documents by means of their form, their style, or their targeted audience. Put another way, genre classification is orthogonal to a classification based on the documents’ contents. While most of the existing investigations of an automated genre classification are based on news articles corpora, the idea here is applied to arbitrary Web pages. W...
متن کاملGenre Classification of Web Documents
Retrieving relevant documents over the Web is an overwhelming task when search engines return thousands of Web documents. Sifting through these documents is time-consuming and sometimes leads to an unsuccessful search. One problem is that most search engines rely on matching a query to documents based solely on topical keywords. However, many users of search engines have a particular genre in m...
متن کاملAutomatic Web Page Classification
Aim of this paper is to describe a method of automatic web page classification to semantic domains and its evaluation. The classification method exploits machine learning algorithms and several morphological as well as semantical text processing tools. In contrast to general text document classification, in the web document classification there are often problems with short web pages. In this p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Applications
سال: 2012
ISSN: 0975-8887
DOI: 10.5120/8457-2265